Senseaudio voice picker followup by QWERTY0205 · Pull Request #2260 · nexu-io/open-design

QWERTY0205 · 2026-05-19T11:20:44Z

Fixes #

Why

What users will see

Surface area

UI — new page / dialog / panel / menu item / setting / empty state in apps/web or apps/desktop (including Electron menu bar)
Keyboard shortcut — new or changed
CLI / env var — new od subcommand or flag, new tools-dev / tools-pack / tools-pr flag, or new OD_* env var
API / contract — new /api/* endpoint, new SSE event, or changed shape in packages/contracts
Extension point — new entry under skills/, design-systems/, design-templates/, or craft/, or change to the skills protocol
i18n keys — added new translation keys (see TRANSLATIONS.md for the locale workflow)
New top-level dependency — adding any new entry to the root package.json (dependencies or devDependencies); workspace-package package.json files are out of scope. Include a paragraph on what we get vs. what bytes we ship (see CONTRIBUTING.md → Code style)
Default behavior change — changes what existing users experience without opting in (default model, default setting, file/SQLite schema, auto-network on startup, auto-install)
None — internal refactor, docs, tests, or translation update only

Screenshots

Bug fix verification

Validation

Speech projects using `senseaudio-tts` had no way to discover the voices a SenseAudio account can synthesise — the only escape hatch was for the user to paste a raw voice_id into the New Project panel or accept the daemon's default. Add an ElevenLabs-style picker so the agent can present a dropdown of the account's available personas and route to the right variant on dispatch. Daemon - `senseaudio-voices.ts` fetches `POST /v1/get_voice`, validates the base_resp envelope, and shapes the response into `Record<prefix, { name, description, variants }>` — the prefix (`male_0028`) keys 1:1 to a persona; colliding prefixes (`female_0030_*`) get keyed by full voice_id instead. The only metadata the API does not return — variant suffix → emotion label — is inlined as a `VARIANT_LABELS` const sourced from docs.senseaudio.cn. 10-min cache by api-key fingerprint. Shaping is wrapped in try/catch so an API field rename returns an empty catalogue (and the prompt falls back to the error path) instead of crashing the daemon. - `GET /api/media/providers/senseaudio/voices` exposes the catalogue. Web - `apps/web/src/providers/senseaudio-voices.ts` mirrors the daemon shape with defensive normalisation. - `ProjectView` wires the fetch through the BYOK compose path so both daemon and BYOK turns get the same catalogue. Prompt - New `senseAudioCatalogue` field on `ComposeInput` (contracts + daemon mirror). `renderSenseAudioPickerInstructions` emits a short bullet instruction, fixed `title` / `description` / `submitLabel` defaults the agent reuses verbatim (localised to the brief language), per-option label rules, post-submit variant-swap logic, and the catalogue JSON. Errors are sanitised through a `formatSenseAudioCatalogueErrorForPrompt` helper that classifies missing-key vs HTTP status code paths. - Localisation lives in the agent: option labels and form copy get translated into the user's brief language at emit time; voice_ids stay verbatim. Tests - `senseaudio-voices.test.ts` covers shape conversion, prefix collisions, hardcoded variant labels, the `通用` fallback, the base_resp error envelope, caching, missing-credentials early exit, and an "API field rename returns empty catalogue" defence. - `system-prompt-senseaudio-voices.test.ts` covers picker injection triggered by `audioModel=senseaudio-tts`, the sanitised error path, and the missing-key Settings hint.

…-picker # Conflicts: # apps/daemon/src/prompts/system.ts # packages/contracts/src/prompts/system.ts

The variant suffix → emotion label map (e.g. female_0033_b → "开心") is documented on docs.senseaudio.cn/guides/voice/catalog.md but not returned by the /v1/get_voice API, so the original PR hardcoded a 50-line table that drifts every time SenseAudio adds a persona. Replace the hardcoded table with a one-time per-process scraper: 1. fetch docs.senseaudio.cn/guides/voice/catalog.md (24h cached) 2. regex `<voice_id>` `(<label>)` across the page → labels map 3. shapeCatalogue() now consults the scraped map first Fallback chain when shaping each variant entry: primary doc-scraped label (fresh, authoritative) secondary BACKUP_VARIANT_LABELS hardcoded (used iff doc fetch fails or yields zero matches; cached only 5min so the live doc is retried fast once it recovers) per-voice voice_name from the API (used iff a specific voice_id is missing from both label sources — never a static "通用" placeholder anymore) Net effect against today's prod catalogue: doc surfaces 111 voice_id labels vs 82 in the hardcoded backup, so 29 voices that previously fell back to "通用" (female_0006_a "深情", male_0023_a "平稳", male_0004_a "平稳", … includes热门 personas) now carry their real label without anyone needing to update source code.

The picker dropdown showed all ~12 personas in catalogue order with no UX hint about which ones actually fit the user's brief. The user had to read every option's label end-to-end to decide. Add a REQUIRED step in renderSenseAudioPickerInstructions: before composing the dropdown, the agent scores each persona for fit against the brief (gender, age, register, tone, scenario keywords), then marks the top 3 with prefix glyphs included in the localised label: ★ nexu-io#1 best match ◆ nexu-io#2 ◇ nexu-io#3 (none for the rest) Top-3 options sort to the front of the dropdown in 1→2→3 order; the remainder follow in original catalogue order. Glyphs are universal (not zh-CN-only) so the localisation rule for the rest of the label keeps working unchanged.

Update the variant-label-fallback test to match the new behaviour introduced in the scrape-from-docs commit: voice_ids missing from both the doc-scraped map and the BACKUP_VARIANT_LABELS hardcoded backup now fall back to the persona's voice_name, not the static "通用" placeholder. The fetch mock now serves a 404 for the docs URL so the daemon deterministically takes the backup path during the test, which was the implicit assumption in the previous version.

The previous prefix set (★ ◆ ◇) was too geometrically similar — the filled and open diamonds were hard to tell apart at a glance, and the black star + diamonds combo did not visually communicate "ranking". Switch to the universal medal emojis. They map onto the gold/silver/ bronze metaphor users already recognise from sports, awards, and leaderboards, and remain locale-neutral so the rest of the label can still be translated freely.

lefarcen

Hey @QWERTY0205! Thanks for picking up the SenseAudio voice-picker follow-up — the changed areas make the intended direction visible, but the PR description is still mostly template placeholders.

Could you fill in ## Why, ## What users will see, ## Surface area, and ## Validation before pool review gets deep into this? This appears to touch UI plus daemon/API-contract voice discovery, so ticking the relevant surface-area boxes and adding the validation commands/screenshots will help reviewers scope the user-facing path quickly. Also, please replace the dangling Fixes # line with a real issue number or remove it if there is no linked issue.

Related: #2044 by @Fl0rencess720 is already open against the same SenseAudio voice-picker area and touches the same 12 files. You two may want to compare approaches; the maintainer team will decide which path lands.

Siri-Ray

@QWERTY0205 thanks for the thoughtful follow-up on the SenseAudio picker. I found one BYOK/API-mode consistency issue to consider; it should be straightforward to fix by keeping the two prompt composers in sync.

_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

Siri-Ray · 2026-05-19T11:28:50Z

+  lines.push('  description: "Pick a voice for the read."');
+  lines.push('  submitLabel: "Use voice"');
+  lines.push('');
+  lines.push('For each dropdown option:');


This contracts-side SenseAudio picker text is now missing the top-3 medal-ranking step that was added to the daemon composer in apps/daemon/src/prompts/system.ts (Top-3 highlighting, 🥇 / 🥈 / 🥉, and sorting the ranked options first). That matters because ProjectView imports composeSystemPrompt from @open-design/contracts for the web/BYOK compose path while daemon-mode runs use apps/daemon/src/prompts/system.ts, so SenseAudio projects behave differently depending on mode: daemon users get the ranked picker instructions, but BYOK/API-mode users only get the unranked dropdown guidance here. Please mirror the medal-ranking block in this contracts composer as well, and add/update a contracts prompt test that asserts the 🥇, 🥈, 🥉 prefixes so future follow-ups cannot drift the two prompt copies again.
_{🔁 Powered by Looper · runner=reviewer · agent=codex · An autonomous AI dev team for your GitHub repos.}

lefarcen · 2026-05-21T03:59:58Z

Hey @QWERTY0205, quick lifecycle update: #2044 has now folded in this SenseAudio voice-picker follow-up, is approved on the current head, and covers the same files/scope we cross-linked above.

Unless you see a piece here that is still missing from #2044, it is probably best to close this PR as superseded so review effort stays focused there. Thanks again for pushing the follow-up ideas — they helped clarify the final scope.

github-actions · 2026-05-22T14:27:13Z

@QWERTY0205 friendly reminder: this PR has been waiting on an author response for more than 3 days after reviewer or maintainer feedback.

When you have a chance, please reply here or push an update. To keep the queue manageable, PRs with no author activity for more than 5 days after feedback may be closed automatically, but they can be reopened when work resumes.

Fl0rencess720 and others added 7 commits May 18, 2026 13:59

Merge remote-tracking branch 'origin/main' into feat/senseaudio-voice…

3ad68b4

…-picker # Conflicts: # apps/daemon/src/prompts/system.ts # packages/contracts/src/prompts/system.ts

ci: trigger re-run

f4c863d

lefarcen requested a review from Siri-Ray May 19, 2026 11:22

lefarcen added size/XL PR changes 700-1500 lines risk/high High risk: apps/desktop, daemon, auth, migration, workflows, package deps type/enhancement Enhancement to existing feature labels May 19, 2026

lefarcen mentioned this pull request May 19, 2026

feat(senseaudio): voice picker for SenseAudio TTS speech projects #2044

Open

3 tasks

lefarcen reviewed May 19, 2026

View reviewed changes

Siri-Ray reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Senseaudio voice picker followup#2260

Senseaudio voice picker followup#2260
QWERTY0205 wants to merge 7 commits into
nexu-io:mainfrom
QWERTY0205:senseaudio-voice-picker-followup

QWERTY0205 commented May 19, 2026

Uh oh!

lefarcen left a comment

Uh oh!

Siri-Ray left a comment

Uh oh!

Siri-Ray May 19, 2026

Uh oh!

lefarcen commented May 21, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

QWERTY0205 commented May 19, 2026

Why

What users will see

Surface area

Screenshots

Bug fix verification

Validation

Uh oh!

lefarcen left a comment

Choose a reason for hiding this comment

Uh oh!

Siri-Ray left a comment

Choose a reason for hiding this comment

Uh oh!

Siri-Ray May 19, 2026

Choose a reason for hiding this comment

Uh oh!

lefarcen commented May 21, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants